Fixes to per-layer lr-scale by RaymondLi0 · Pull Request #243 · ServiceNow/Fast-LLM

RaymondLi0 · 2025-04-29T19:18:28Z

✨ Description

Specify different lr-scales per layer.

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

Change A
Change B

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes.
📝 I have updated the documentation if needed.
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

🐋 I have updated the Docker configuration or dependencies, if applicable.
🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes.
🚦 I have tested these changes on GPUs and verified training stability.
🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

📊 I have run benchmarks where applicable to evaluate the performance impact.
✅ The benchmarks show no performance regression.
🚀 The benchmarks indicate a potential performance improvement.
⚠️ The benchmarks indicate a potential performance degradation.
📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:

🗒️ Additional Notes

Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.

oleksost · 2025-06-12T13:51:12Z

I think this can be closed as this has been merged in #258

RaymondLi0 · 2025-06-16T17:32:06Z

Sorry for the delay on this one. @oleksost could you have another look?
Seems some fixes were not included in #258

oleksost · 2025-06-16T18:54:18Z

Thanks for finding these bugs! These should be merged asap I think

jlamypoirier · 2025-06-16T19:40:34Z

                        self._tensor_space,
                        # TODO MTP: which index?
-                        layer_index=max(self._config.transformer.num_layers, 1),
+                        layer_index=max(self._config.transformer.num_layers + i, 1),


This will have unintended consequences on the initialization scale.

This only affects the prediction-heads for i>0 (thus not the next-token prediction)

Yes, but the layer index is used elsewhere. It looks like it's only used in the backup attention regularization though, so it doesn't matter much https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/layers/transformer/attention.py#L181. I got mixed up with num_layers which does matter for initialization.

RaymondLi0 added 2 commits April 25, 2025 21:43

add per-layer lr-scale

9ddfb69

add token-prediction loss coefficients

77ad39f

oleksost reviewed May 5, 2025

View reviewed changes

Comment thread fast_llm/layers/transformer/config.py Outdated

RaymondLi0 added 2 commits May 5, 2025 21:50

disable freezing

41d4da3

layer-lr scale for mlp as well

9c4f38f

RaymondLi0 mentioned this pull request May 7, 2025

[bug] Resuming experiment in distributed format with frozen weights #256

Closed

oleksost added a commit that referenced this pull request May 12, 2025

merged also prediction_loss_coefficient from #243

9af5ee5

RaymondLi0 added 3 commits May 12, 2025 20:41

add check for length of per_layer_lr_scale

6fe2b6d

re-enable freezing

83baeef

pass layer-index to mlp

e834be7

oleksost mentioned this pull request May 14, 2025

[bug] test_checkpoint test not passing when any lr scale is set to 0 #265

Closed

jlamypoirier changed the title ~~For visibility: add per-layer lr-scale~~ [Prototype] For visibility: add per-layer lr-scale Jun 4, 2025

RaymondLi0 added 2 commits June 16, 2025 13:23

Merge branch 'main' into raymond/per_layer_lr_scale

86e62c8

remove comments

f2f5265

RaymondLi0 marked this pull request as ready for review June 16, 2025 17:31

RaymondLi0 changed the title ~~[Prototype] For visibility: add per-layer lr-scale~~ Fixes to per-layer lr-scale Jun 16, 2025

oleksost self-requested a review June 16, 2025 18:53

oleksost approved these changes Jun 16, 2025

View reviewed changes

remove per_layer_lr_scale in transformer config (already in llmblock)

6622040

RaymondLi0 merged commit 1371e47 into main Jun 16, 2025
4 checks passed

RaymondLi0 deleted the raymond/per_layer_lr_scale branch June 16, 2025 19:27

jlamypoirier reviewed Jun 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes to per-layer lr-scale#243

Fixes to per-layer lr-scale#243
RaymondLi0 merged 10 commits into
mainfrom
raymond/per_layer_lr_scale

RaymondLi0 commented Apr 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

oleksost commented Jun 12, 2025

Uh oh!

RaymondLi0 commented Jun 16, 2025

Uh oh!

oleksost commented Jun 16, 2025

Uh oh!

Uh oh!

jlamypoirier Jun 16, 2025

Uh oh!

RaymondLi0 Jun 16, 2025

Uh oh!

jlamypoirier Jun 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RaymondLi0 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact

📊 Performance Impact Details

🗒️ Additional Notes

Uh oh!

Uh oh!

oleksost commented Jun 12, 2025

Uh oh!

RaymondLi0 commented Jun 16, 2025

Uh oh!

oleksost commented Jun 16, 2025

Uh oh!

Uh oh!

jlamypoirier Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

RaymondLi0 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RaymondLi0 commented Apr 29, 2025 •

edited

Loading